Combining Supertagging and Lexicalized Tree-Adjoining Grammar Parsing∗

نویسنده

  • Anoop Sarkar
چکیده

In this paper we study various reasons and mechanisms for combining Supertagging with Lexicalized Tree-Adjoining Grammar (LTAG) parsing. Because of the highly lexicalized nature of the LTAG formalism, we experimentally show that notions other than sentence length play a factor in observed parse times. In particular, syntactic lexical ambiguity and sentence complexity (both are terms we define in this paper) are the dominant factors that affect parsing efficiency. We show how a Supertagger can be used to drastically reduce the syntactic lexical ambiguity for a given input and can be used in combination with an LTAG parser to radically improve parsing efficiency. We then turn our attention to from parsing efficiency to parsing accuracy and provide a method by which we can effectively combine the output of a Supertagger and a statistical LTAG parser using a co-training algorithm for bootstrapping new labeled data. This combination method can be used to incorporate new labeled data from raw text to improve parsing accuracy.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

TOWARDS EFFICIENT STATISTICAL PARSING USING LEXICALIZED GRAMMATICAL INFORMATION by

For a long time, the goal of wide-coverage natural language parsers had remained elusive. Much progress has been made recently, however, with the development of lexicalized statistical models of natural language parsing. Although lexicalized tree adjoining grammar (TAG) is a lexicalized grammatical formalism whose development predates these recent advances, its application in lexicalized statis...

متن کامل

Automated Extraction of Tags from the Penn Treebank

The accuracy of statistical parsing models can be improved with the use of lexical information. Statistical parsing using Lexicalized tree adjoining grammar (LTAG), a kind of lexicalized grammar, has remained relatively unexplored. We believe that is largely in part due to the absence of large corpora accurately bracketed in terms of a perspicuous yet broad coverage LTAG. Our work attempts to a...

متن کامل

Some Experiments on Indicators of Parsing Complexity for Lexicalized Grammars

In this paper, we identify syntactic lexical ambiguity and sentence complexity as factors that contribute to parsing complexity in fully lexicalized grammar formalisms such as Lexicalized Tree Adjoining Grammars. We also report on experiments that explore the effects of these factors on parsing complexity. We discuss how these constraints can be exploited in improving efficiency of parsers for ...

متن کامل

Integrated Supertagging and Parsing

Parsing is the task of assigning syntactic or semantic structure to a natural language sentence. This thesis focuses on syntactic parsing with Combinatory Categorial Grammar (CCG; Steedman 2000). CCG allows incremental processing, which is essential for speech recognition and some machine translation models, and it can build semantic structure in tandem with syntactic parsing. Supertagging solv...

متن کامل

Supertagging: Introduction, learning, and application

Supertagging is an approach originally developed by Bangalore and Joshi (1999) to improve the parsing efficiency. In the beginning , the scholars used small training datasets and somewhat na¨ıve smoothing techniques to learn the probability distributions of supertags. Since its inception, the applicability of Supertags has been explored for TAG (tree-adjoining grammar) formalism as well as othe...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006